refactor(ingestion): drop legacy file_* columns#177
Merged
Conversation
…sources Completes the source-agnostic ingestion_jobs schema and removes the last of the file-shaped scaffolding: - Migration drops file_path / file_name / file_size / file_mtime_ns and the inline (dataset_id, file_path) UNIQUE; external_id and fingerprint become NOT NULL, with a UNIQUE INDEX on (dataset_id, external_id) taking over as the uniqueness invariant. - IngestionJob entity, repository upsert and IngestionJobResponse schema lose every reference to the file_* fields. The manager's _derive_legacy_file_fields helper and the dual-write threading through upsert_by_external_id are gone. - NoOpSource declares IS_NOOP = True; IngestionManager._has_source reads it and skips spawning per-dataset tasks for externally-fed bindings (remote Weaviate today). - Frontend: IngestionJobResponse TS interface mirrors the new shape; DatasetDetailPage renders the basename of external_id and drops the now-orphan formatFileSize helper; EndpointDetailPage filters by external_id.
d0c9bd0 to
cd3b4ae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request completes the migration of the ingestion pipeline to use source-agnostic identifiers, removing legacy file-specific columns and logic throughout the backend and frontend. The
ingestion_jobstable and related code now rely solely onexternal_idandfingerprintas unique and required fields, simplifying the data model and APIs. The frontend is updated to reflect these changes, removing references to file-specific fields.Database schema and data model migration:
file_path,file_name,file_size,file_mtime_ns) and the associated uniqueness constraint from theingestion_jobstable. The new uniqueness constraint is on (dataset_id,external_id), and bothexternal_idandfingerprintare now NOT NULL. Migration includes upgrade and downgrade logic.IngestionJobSQLModel entity to remove deprecated file-specific fields and enforce non-nullableexternal_idandfingerprint. Table constraints and indexes are updated to match the new schema. [1] [2]Backend logic and API updates:
external_idandfingerprint. [1] [2] [3] [4] [5] [6] [7]external_idandfingerprint.NoOpSourceto include anIS_NOOPflag, allowing the ingestion manager to skip unnecessary tasks for externally-fed sources.NoOpSourceinstances, reducing unnecessary bookkeeping.Frontend updates:
external_idandfingerprintinstead. Display logic now infers the filename fromexternal_id. [1] [2] [3] [4]